Journal: Bioinformatics
Article Title: HTSinfer: inferring metadata from bulk Illumina RNA-Seq libraries
doi: 10.1093/bioinformatics/btaf076
Figure Lengend Snippet: Summary of HTSinfer and case study results. In comparisons with known metadata in panels (C), (D), (E), and (F), we assigned the label “Agreement” if the HTSinfer-inferred metadata is consistent with the known metadata, and “Disagreement” otherwise. If HTSinfer did not infer metadata at all, we assigned “Unassigned” instead. Since HTSinfer determines the library source independently for each mate file in paired-end samples, in (D), (E), and (F) we assigned “Disagreement” if either of the mate files did not match the reference. (A) Overview of HTSinfer workflow, from the FASTQ file of sequencing reads to the JSON output of metadata. The tools used to infer specific metadata (indicated in the gray boxes) are shown in the boxes with blue background. (B) Runtime (in min) for analyzing SRA samples, as a function of sample size. Means and standard deviations are shown ( n = 720 samples with 1 run each). (C) Results of library source inference for the 720 SRA samples as a function of the number of randomly selected records. (D) Detailed results from the library source inference of the SRA samples ( n = 720), using 1 million records per sample. Samples for which there was disagreement on the library source, or the library source was considered ambiguous were further inspected. Samples for which an unambiguous call was not made were further separated into categories depending on whether the species of origin had rank 1, 2, or 3 or more based on the number of RP-mapped reads. (E) Results of library source inference on SRA samples with minimal RNA degradation (median TIN score > 70, n = 82). (F) Validation of results on the 72 samples generated in-house. As there were no disagreements with known metadata, the “Disagreement” columns were omitted.
Article Snippet: HTSinfer includes a library of eighteen 12-nucleotide-long fragments of adapter sequences, manually curated from the most common Illumina sequencing kits for RNA-sequencing ( https://support-docs.illumina.com/SHARE/AdapterSequences/Content/SHARE/FrontPages/AdapterSeq.htm ).
Techniques: Sequencing, Generated